Chinese Grammatical Error Correction Using Pre-trained Models and Pseudo Data

نویسندگان

چکیده

In recent studies, pre-trained models and pseudo data have been key factors in improving the performance of English grammatical error correction (GEC) task. However, few studies examined role Chinese GEC Therefore, we develop based on three models: BERT, T5, BART, then incorporate these with to determine best configuration for On natural language processing computing (NLPCC) 2018 shared task test set, all our single outperform ensemble developed by top team BART achieves an F score 37.15, which is a state-of-the-art result. We combine kinds data: Lang-8 (MaskGEC), Wiki (Backtranslation). find that most can benefit from data, BART+Lang-8 (MaskGEC) ideal setting terms accuracy training efficiency. The experimental results demonstrate effectiveness provide easily reproducible adaptable baseline future works. Finally, annotate types development data; show word-level errors dominate types, word selection must be addressed even when using data. Our codes are available at https://github.com/wang136906578/BERT-encoder-ChineseGEC .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating and Scoring Correction Candidates in Chinese Grammatical Error Diagnosis

Grammatical error diagnosis is an essential part in a language-learning tutoring system. Based on the data sets of Chinese grammar error detection tasks, we proposed a system which measures the likelihood of correction candidates generated by deleting or inserting characters or words, moving substrings to different positions, substituting prepositions with other prepositions, or substituting wo...

متن کامل

Grammatical Error Correction Using Integer Linear Programming

We propose a joint inference algorithm for grammatical error correction. Different from most previous work where different error types are corrected independently, our proposed inference process considers all possible errors in a uni ed framework. We use integer linear programming (ILP) to model the inference process, which can easily incorporate both the power of existing error classi ers and ...

متن کامل

Grammatical error correction using neural machine translation

This paper presents the first study using neural machine translation (NMT) for grammatical error correction (GEC). We propose a twostep approach to handle the rare word problem in NMT, which has been proved to be useful and effective for the GEC task. Our best NMTbased system trained on the CLC outperforms our SMT-based system when testing on the publicly available FCE test set. The same system...

متن کامل

Grammatical Error Correction

Grammatical error correction (GEC) is the task of automatically correcting grammatical errors in written text. Earlier attempts to grammatical error correction involve rule-based and classifier approaches which are limited to correcting only some particular type of errors in a sentence. As sentences may contain multiple errors of different types, a practical error correction system should be ab...

متن کامل

Chinese Grammatical Error Diagnosis Using Ensemble Learning

Automatic grammatical error detection for Chinese has been a big challenge for NLP researchers for a long time, mostly due to the flexible and irregular ways in the expressing of this language. Strictly speaking, there is no evidence of a series of formal and strict grammar rules for Chinese, especially for the spoken Chinese, making it hard for foreigners to master this language. The CFL share...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3570209